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INTRODUCTION 

The  research  reported was  concerned  with  the  development 
of  paper-and -pencil  job  knowledge  tests  for  six  positions  in  the  Weap¬ 
ons  Branch  of  a  Direction  Center  in  the  SAGE  System.  AotaaHyj  ihree 
tests  were  developed,  the  positions  being  paired  as  follows:  Senior  Di¬ 
rector/Senior  Director  Technician  (SD/SDT);  Weapons  Director/Weap¬ 
ons  Director  Technician  (WD/WDT);  Intercept  Director /Intercept  Di¬ 
rector  Technician  (IND/INT). 

The  report  treats  the  job  description  techniques,  rationale  for  test 
outline,  item  development,  preliminary  tryout  in  the  New  York  Air  De¬ 
fense  Sector  (NYADS),  item  analysis  and  test  revision,  final  adminis¬ 
tration  at  the  Boston  and  Syracuse  Air  Defense  Sectors  (BOADS  and 
SYADS),  and  at  the  training  facility  at  Richards-Gebaur  Air  Force 
Base  (RG).  Also  included  are  information  on  test  reliability  and  valid¬ 
ity,  and  recommendations  for  normative  use  of  the  tests.. 

The  test  materials,  including  the  items,  and  instruction^  for  ad¬ 
ministering,  scoring,  and  interpreting  the  results,  are  primed  in  a 
separate  booklet.  \ 
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TEST  DEVELOPMENT 

The  test  development  phase  of  the  contract  was  divided  into  four 
sub-phases:  job  description,  item-writing,  trial  administration  and 
test  revision,  and  test  analysis.  These  sub-phases  are  considered 
in  detail  below. 

JOB  DESCRIPTION 

The  technique  for  describing  the  positions  prior  to  the  construc¬ 
tion  of  test  items  was  that  of  the  task- equipment  analysis  (TEA).  As 
the  term  implies,  the  analysis  describes  the  relationship  between  the 
equipment  to  be  operated  and  the  task  of  the  operator.  A  detailed  de¬ 
scription  of  the  philosophy  and  methodology  of  the  TEA  may  be  found 
in  AFCRC-TN-59-76,  "SAGE  Task  Equipment  Analysis— Intercept  Di¬ 
rector/Intercept  Director  Technician,  n  February  1960,  or  in  any  of 
the  TEAs  covering  other  positions  in  the  SAGE  Direction  Center.  * 

ITEM-WRITING 

Another  contractor  has  developed  paper  and  pencil  tests  for  all 
Direction  Center  Operator  Positions  other  than  those  in  the  Weapons 
Branch.  2  An  examination  of  the  TEAs  for  the  SASO  and  SC  in  the 
Combat  Center  revealed  that  these  jobs  were  unsuitable  as  subjects 
for  paper-anJ-pencil  test  development.  There  were  several  reasons 
why  this  is  so.  First,  there  was  very  little  job  content  in  day-to-day 
operation,  the  tasks  being  mostly  limited  to  decision-making  at  the 
time  of  exercises  or  missions.  Second,  the  number  of  incumbents 
was  insufficient  for  any  psychometric  analysis;  hence,  there  would 

1  Other  Technical  Notes  published  as  TNs  under  the  auspices  of 
the  Operational  Applications  Laboratory,  Air  Force  Cambridge  Re¬ 
search  Center  or  the  Operational  Applications  Office,  Air  Force 
Command  and  Control  Development  Division. 

2See  AFCRC-TN-58-63,  58-64,  58-65,  58-66,  58-62. 
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be  no  way  of  checking  on  the  characteristics  of  the  tests  produced.  For 
these  reasons  it  was  determined,  with  the  consent  of  OAL,  that  test 
development  would  be  limited  to  six  positions  in  the  Weapons  Branch: 
Senior  Director  (SD)  and  Senior  Director  Technician  (SDT),  Weapons 
Director  (WD)  and  Weapons  Director  Technician  (WD),  and  Intercept 
Director  (IND)  and  Intercept  Director  Technician(INT). 

The  original  test  plan  involved  the  development  of  five  types  of 
test  items: 

1.  S  items,  involving  knowledge  of  Situational  Displays  and 
Digital  Display  symbology 

2.  C  items,  involving  knowledge  of  computer  capabilities  and 
functions 

3.  T  items,  involving  knowledge  of  tactics,  SOFs  and  aircraft 
and  weapons  capabilities 

4.  D^  items,  involving  tactical  decisions 

5.  Dg  items,  involving  decisions  affecting  areas  broader  than 
tactics 

It  soon  became  evident  that  D^  and  D^  item  types  were  not  read¬ 
ily  amenable  to  paper-and-pericil,  multiple  choice  format.  Possible 
items  of  these  types  were  too  few  to  constitute  a  realistic  part  of  any 
test.  Accordingly,  it  was  determined  to  limit  the  items  to  S,  C,  and 
T  types. 

For  the  S  items,  the  use  of  photographic  reproductions  of  SIDs 
and  DIDs  was  considered  and  rejected.  Observation  of  the  operators 
indicated  that  they  were  able  to  read  symbology  very  easily,  despite 
the  fact  that  the  dynamic  quality  of  the  displays  seems  to  pose  a  seri¬ 
ous  problem.  Thus,  photographic  reproduction  might  add  a  simulation 
of  reality  unrelated  to  the  actual  task,  i.  e. ,  the  interpretation  of  the 
symbology. 
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Using  the  TEA  and  other  sources,  approximately  350  multiple 
choice  items  were  written.  Many  of  these  were,  of  course,  appropri¬ 
ate  to  several  positions  in  the  Weapons  Branch,  The  actual  distribu¬ 
tions  of  items  by  type  among  the  positions  are  given  in  Table  1,  Each 
of  the  items  developed  was  reviewed  by  a  military  subject  matter  ex¬ 
pert;  an  additional  review  was  performed  for  psychometric  propriety. 

An  attempt  was  made  to  anticipate  situations  which  would  make 
items  inapplicable  or  obsolete.  A  scheme  was  devised  whereby  each 
item  was  classified  according  to  its  sector  applicability,  and  suscepti¬ 
bility  to  error  because  of  program  model,  weapons,  or  tactics  changes. 
Whenever  such  changes  occurred  or  the  test  was  to  be  used  in  a  new 
sector,  it  was  planned  that  the  indicated  items  would  be  reviewed.  It 
became  apparent  that  it  was  not  possible  to  apply  such  classifications 
reliably.  If  such  a  system  were  used,  it  would  be  quite  possible  that 
some  "bad"  items  would  be  overlooked.  Alternatively,  the  wisest 
policy  seemed  to  be  that  of  reviewing  the  entire  test  by  a  subject  mat¬ 
ter  expert  before  it  is  applied  in  a  particular  sector  situation. 

TRIAL  ADMINISTRATION  AND  TEST  REVISION 

The  six  tests,  with  item  composition  as  shown  in  Table  1,  were 
administered  at  NYADS,  McGuire  Air  Force  Base,  New  Jersey.  The 
results  of  that  testing  indicating  means  and  standard  deviations  for 
each  position,  are  given  in  Table  2.  The  means  are  sufficiently  low 
in  relation  to  maximum  possible  scores  to  be  discriminating;  the 
standard  deviations  are  consistent  with  the  small  Ns. 

It  may  be  considered  as  a  maxim  that  item  analyses  for  achieve¬ 
ment  test  items  require  an  N  of  at  least  200,  and  that  computation  of 
validity  and  reliability  coefficients  requires  an  N  of  at  least  75,  for 
the  results  to  be  meaningfully  interpretable.  Nevertheless,  because 
it  was  felt  that  some  sort  of  psychometric  information  was  necessary 
before  proceeding  further  in  test  development,  such  analyses  were 
performed  on  the  data  for  INDs  (N  -  19)  and  INTs  (N  *  15).  It  was 
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TABLE  1 

NUMBER  OF  ITEMS  BY  TYPE  FOR  OPERATOR  POSITION 
PROFICIENCY  TESTS  IN  THE  SAGE  SYSTEM 

OPERATOR  POSITION  ITEM  TYPE  TOTAL 


S 

C 

T 

SD 

51 

56 

48 

155 

SDT 

50 

33 

33 

116 

WD 

48 

80 

77 

205 

WDT 

50 

47 

50 

147 

IND 

46 

70 

82 

198 

INT 

42 

34 

47 

133 

TABLE  2 

SUMMARY  OF  RESULTS  OF  PROFICIENCY  TEST 
ADMINISTRATION  IN  NYADS 


OPERATOR  POSITION 

N 

T 

s 

MAX, 

SD 

4 

124.  3 

2.6 

155 

SDT 

5 

93.6 

2.8 

116 

WD 

2 

— 

— 

— 

WDT 

7 

113.  4 

7.  6 

197 

IND 

20 

149.6 

11.0 

198 

INT 

15 

94.  1 

7.2 

133 
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possible  to  get  ranking  information  from  the  superiors  of  15  of  the 
INDs.  Inasmuch  as  no  ranker  was  aware  of  performance  in  all  three 
weapons  teams,  this  necessitated  triple  ties  at  each  rank,  since  each 
Weapons  Director  ranked  the  INDs  in  his  team  from  1  to  5.  These 
rankings  were  used  as  criteria  in  a  validity  check  of  the  total  test  and 
each  of  its  parts.  It  was  not  possible  to  get  estimates  of  the  reliability 
of  these  rankings,  since  only  one  ranker  was  available  for  each  group 
of  INDs.  The  results  of  the  test  reliability  and  validity  analyses  are 
given  in  Table  3.  Although  the  test  reliability  estimates  using  Kuder- 
Richardson  Formula  20  were  quite  low  in  some  instances,  the  test- 
part  intercorrelations  and  external  validity  coefficient  for  the  INDs 
were  deemed  to  be  satisfactory  evidence  of  the  adequacy  of  the  test 
items. 

The  validity  index  (phi  coefficient)  and  the  p-value  of  each  item 
were  examined  for  the  IND  ami  INT  data.  All  items  with  negative  phi 
coefficients  were  discarded.  Also,  the  limits  0.  15  and  0.  85  were 
established  arbitrarily,  and  items  with  p-values  below  or  above  these 
limits,  respectively,  were  discarded.  This  procedure  was  adopted 
since  items  with  p-values  more  extreme  than  these  limits  contribute 
little  to  item  reliability  and  test  validity.  No  analysis  was  possible 
for  the  data  from  the  SD/SDT  and  WD/WDT  test  administrations  since 
too  few  cases  were  available* 

A  re-examination  of  the  mission  of  the  paper-and-pencil  proficiency 
tests  led  to  the  conclusion  that  there  was  no  need  to  have  separate  tests 
for  the  operator  and  technician  at  any  one  position.  Essentially,  each 
is  supposed  to  know  all  of  the  information  (as  distinguished  from  skills 
and  decision-making  ability)  required  for  the  combined  position.  It 
might  be  expected,  of  course,  that  the  technicians  would,  on  the  average, 
earn  lower  scores  than  the  operators  when  both  are  administered  the 
same  test.  This  expectation  seems  reasonable  since  operators  are 
commissioned  officers  and  technicians  are  airmen,  and  the  former 
group  is  both  more  directly  involved  in  the  job  and  has  the  advantage 
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of  a  better  educational  background.  It  seemed  to  be  true  also  that  some 
of  the  items  were  appropriate  to  all  three  positions  in  the  Weapons 
Branch  {SD/SDT,  WD/WDT,  and  IND/INT),  some  to  two  of  these  posi¬ 
tions,  and  some  to  only  one  position.  Accordingly,  the  available  items 
were  categorized  into  six  parts.  Part  I  consisted  of  items  common  to 
all  three  positions.  Part  II  for  WD/WDT  and  IND/INT,  Part  in  for 
IND/INT  only.  Part  IV  for  SD/SDT  and  WD/WDT,  Part  V  for  WD/WDT 
only,  and  Part  VI  for  SD/SDT  only.  The  summary  of  the  parts  applica¬ 
ble  to  each  position  appears  in  Table  4. 

TABLE  4 

COMPOSITION  OF  SAGE  PROFICIENCY  TESTS  BY 
MODULAR  PARTS  FOR  THREE  OPERATOR  POSITIONS 

SD/SDT  WD/WDT  IND/INT 


Form  A 

Form  B 

Form  A 

Form  B 

Form  A 

Form  B 

IA 

IB 

IA 

IB 

IA 

IB 

IVA 

IVB 

HA 

HB 

HA 

HB 

VIA 

VIB 

IVA 

IVB 

IHA 

IHB 

VA  VB 

¥ 

It  is  important  to  remember  that  these  part  numbers  do  not  correspond 
in  any  way  to  the  symbology,  computer  and  tactic  breakdown  of  the  sub¬ 
ject  matter  of  the  items.  The  part  numbers  merely  represent  modules 
which  can  be  combined  in  various  ways  to  produce  tests  for  various  po¬ 
sitions.  The  combination  of  these  parts  into  tests,  giving  the  number 
of  items  in  each  part,  is  represented  in  Table  5.  Since  alternate  forms 
were  required,  and  the  number  of  reliable,  valid  items  was  limited,  it 
was  decided  to  make  some  items  common  to  both  of  the  alternate  forms 
required.  Thus,  for  example.  Table  5  indicates  that  Part  I  consists  of 
53  items  for  either  form;  however,  in  each  form,  38  items  are  unique 
and  15  items  are  common. 
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TABLE  5 

DISTRIBUTION  OF  ITEMS  BY  PARTS 
FOR  ALTERNATE  OPERATOR  TEST  FORMS 

OPERATOR  POSITION 


TEST 

PARTS 

NO.  ITEMS 

COMMON 

UNIQUE 

SD/SDT 

I 

53 

15 

38 

IV 

21 

11 

10 

VI 

6 

3 

3 

TOTAL 

80 

29 

51 

WD/WDT 

I 

53 

15 

38 

II 

17 

11 

6 

IV 

21 

11 

10 

V 

10 

4 

6 

TOTAL 

101 

41 

60 

IND/INT 

I 

53 

15 

38 

II 

17 

11 

6 

III 

12 

6 

6 

TOTAL 

82 

32 

50 
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TEST  ANALYSIS 

Administration 

The  revised  tests  were  administered  in  the  Weapons  Branch  at 
BOADS  and  SYADS  and  to  graduating  students  of  classes  in  Intercept 
Direction  and  Weapons  Direction  at  Richards -Gebaur  Air  Force  Base. 

The  N*s,  means  *  and  standard  deviations  for  each  part  and  the  total  test 
are  reported  in  Tables  6  and  7.  nPartsn  as  used  here  means  the  three 
subject  matter  categories  of  the  test,  i.  e.,  symbology  (S),  computer 
knowledge  (C),  and  tactics  (T).  At  BOADS  almost  all  incumbents  took 
both  the  A  and  B  forms.  The  testing  order  was  counterbalanced  so  that 
A  and  B  forms  were  given  first  and  second  equally  often.  At  SYADS 
and  Richards -Gebaur  AFB,  it  was  possible  to  test  each  incumbent  with 
only  one  form;  at  these  sites  half  of  the  incumbents  were  assigned  each 
of  the  forms  at  random. 

Analysis  of  Variance  of  Means 

An  analysis  of  variance  was  performed  on  the  variation  of  means  by 
operators,  sectors  and  forms.  Results  are  reported  in  Table  8.  None 
of  the  main  effects  or  interactions  were  found  to  be  significant  at  the  5% 
level  of  confidence.  This  finding  may  be  interpreted  to  mean  that,  for 
the  number  of  test  scores  available,  sector  differences,  operator 
differences  and  form  differences  observable  in  Table  6  may  be  attributed 
to  random  variation  around  an  over-all  average. 

Reliability  Analyses 

A  series  of  reliability  analyses  were  performed.  These  were  of 
several  types.  First,  internal  consistency  analyses  using  the  Kuder- 
Richardson  Formula  No.  20,  or  its  equivalent  the  Hoyt  analysis  of 

Some  of  the  means  of  Table  6  have  been  adjusted  to  reflect  revision 
in  two  items  which  were  scored  erroneously. 


10 


ADJUSTED  MEAN  SCORES  ON  SD/SDT,  WD/WDT  AND  IND/INT 
PROFICIENCY  TESTS  BY  PARTS  AND  ALTERNATE  FORMS 
AT  BOADS,  RICHARDS-GEBAUR  AFB  AND  SYADS 


Psychological  Research  Associates, 


Inc. 


T - 

1 

1 - 

1 - 

1 - 

1 - 

1 - 

1 - 1 

< 

h 

50 

33 

57 

r- 

00 

© 

© 

50 

61 

69 

00 

© 

© 

© 

© 

97 

C 

h 

49 

LI 

49 

CO 

69 

C- 

© 

Oi 

© 

— 

57. 

51. 

co 

©* 

© 

© 

•* 

H 

.50 

00 

o 

90 

o 

90 

TO 

© 

50 

00 

C- 

© 

CO 

36 

o 

m 

tc 

< 

Cu 

18 

15 

r- 

CO 

23 

o* 

CM 

© 

CM 

co 

©* 

21. 

i> 

Q 

< 

> 

in 

U 

00  * 

CO 

<0 

O 

CO 

o 

CM 

o 

© 

© 

l  L 

08 

o 

© 

91 

54 

m 

W 

14 

CO 

13 

■*r 

CO 

CM 

oo 

CM 

26. 

© 

22. 

00 

20. 

CM 

H 

CO 

OS* 

o 

o 

97 

m 

© 

in 

50 

47 

97 

67 

© 

CM 

© 

42 

CM 

o 

CM 

CO 

co 

© 

15. 

CM 

© 

©' 

90 

CO 

Z 

CM 

CO 

© 

in 

CM 

CM 

CM 

CM 

CM 

J 

< 

CO 

CM 

© 

a 

H 

© 

© 

CO 

CO 

O 

O 

t- 

o’ 

© 

*69 

62. 

©* 

© 

46. 

C4 

44. 

m 

CO 

© 

© 

CO 

© 

ft 

m 

t> 

CM 

•O' 

t- 

D 

H 

© 

© 

t'- 

00 

< 

PQ 

W 

0 

i 

& 

£ 

CM 

CM 

r- 

CO 

00 

© 

© 

CM 

oo 

in 

Q 

K 

H 

CO 

W 

U 

25. 

CM 

24. 

o* 

CM 

CM 

CM 

©* 

o’ 

CM 

©* 

< 

H 

X 

c o 

CO 

00 

o 

CO 

CO 

00 

© 

CM 

© 

2 

CO 

m 

0- 

o’ 

O* 

O 

©’ 

2 

© 

CM 

o 

© 

** 

J 

< 

H 

09 

o 

CM 

99 

c~ 

r~ 

o 

in 

69 

© 

07 

© 

CO 

CM 

69 

0 

F-* 

19 

59 

CO 

09 

00 

f- 

t— 

73. 

72. 

CO 

© 

© 

© 

50. 

H 

o 

90 

o 

CO 

00 

o 

m 

26 

61 

76 

CM 

CM 

63 

44 

51 

h 

CM 

21 

23 

CM 

CM 

24 

22. 

23. 

23. 

on 

CM 

CO 

in 

00 

in 

tf  - 

— 

i 

Q 

< 

1 

— 

< 

0 

X 

U 

.  60 

20 

52 

52 

74 

© 

r-- 

© 

CO 

© 

86 

08 

29 

CO 

w 

17 

CO 

CO 

20 

25. 

26. 

■o 

CM 

rP 

CM 

22. 

© 

CM 

22. 

t-1 

CO 

08* 

00 

57 

99 

o 

© 

97 

0- 

CM 

36 

97 

87 

CO 

16 

CO 

CO 

© 

00 

© 

© 

©’ 

‘01 

o* 

©* 

£ 

© 

in 

m 

© 

•O' 

© 

•<? 

■<* 

CM 

— .  I 

r-l 

CM 

CM 

h  s 

K 

O 

< 

< 

CQ 

CQ 

< 

< 

PQ 

PQ 

< 

< 

CQ 

PQ 

fH 

£ 

8 

Q 

H 

Q 

Q 

H 

Q 

Q 

h 

Q 

Q 

H 

Q 

Q 

H 

Q 

H 

1 

0 

£ 

CO 

co 

co 

in 

£ 

£ 

* 

£ 

g 

g 

g 

g 

11 


Psychological  Research  Associates,  Inc. 


h 

2 


Q  w 

5s 

<  « 

9  <  Jh 
.PS  Q 

as? 

5<g 

Sag 

^4  5  ® 
o  <  w 
wg9 

a«s 

ogtc 
y  *  < 

w  pH  sq 
fcPQU 
OH« 


2  gw* 
O^Q 

Px< 

2  2  S 

"  H 


> 

qu 

8§ 

<  S5 

Q(X 

2 

< 

H 


J 

■Bl 

< 

B 

i-4 

OS 

o 

CO 

to 

05 

o 

o 

05 

e 

m 

•-1 

XI 

O 

o 

CO 

US 

05 

m 

n 

CM 

0 

to 

Id 

to 

OS 

in 

CO* 

in 

CO* 

in 

t- 

H 

1 

mm 

■B 

m 

El 

< 

in 

05 

o 

o 

os 

o 

o 

t- 

o 

CO 

5 

AJ 

o 

CO 

o 

o 

m 

o 

05 

C0 

CO 

■ 

o 

§fiEJ 

CM* 

os 

in 

CM 

V 

CM 

CO 

CO* 

CO 

E- 

00* 

■ 

H 

■i 

n 

n 

o 

CM 

o 

CO 

CO 

o 

o 

*H 

o 

CM 

00 

CO 

•H 

H 

mm 

•h’ 

*H 

CO 

CO 

CO 

CO 

Eh 

■ 

HI 

Q 

os 

eh 

E 

< 

< 

CM 

o> 

o 

o 

CM 

o 

US 

X 

05 

«»> 

CO 

o 

o 

OS 

o 

05 

H 

Eh 

CM 

CO 

CM 

wi 

CO 

H 

*H 

H9 

CO 

■ 

W 

H 

■ 

;§ 

h 

o 

o 

B 

E 

m 

CM 

in 

o 

o 

s 

o 

■ 

co 

o’ 

B 

E 

u 

CM 

m 

CM 

■ 

■ 

■ 

E 

E 

E 

B 

»H 

Km 

HI 

E 

E 

rH 

b 

E 

< 

B 

tH 

00 

f 

CM 

*H 

O 

H 

■ 

■ 

■ 

*r 

CO 

g 

■ 

■ 

■ 

E 

m 

m 

B 

■' 

l> 

< 

B 

t- 

« 

B 

■M 

CO 

w 

0 

H 

■ 

■ 

E 

E 

E 

E 

Ctf 

m 

■■ 

E 

HE 

s» 

^m 

■mm 

w 

< 

E 

CO 

in 

o 

Q 

cu 

B 

E 

Hit 

CM 

PS 

< 

X 

h 

w 

w 

H 

■ 

■ 

■ 

1 

■ 

■ 

E 

■ 

m 

m 

CO 

V 

B 

■ 

K 

to 

00 

CO 

CM 

B 

sHHEH 

05 

Tf 

1 

■ 

■ 

■ 

13 

■ 

°' 

m 

■ 

W’ 

■a 

■ 

E 

E 

E 

m 

■ 

■a 

E 

E 

E 

Wb 

rH 

■ 

ee 

OS 

•H 

CO 

CM 

CO 

to 

05 

CO 

m 

^K| 

os 

C- 

05 

CM 

us 

05 

CO 

in 

M<* 

-T* 

00* 

V 

in 

CO* 

CO 

■ 

■ 

■ 

■ 

■ 

CM 

CO 

B 

■8 

o 

M- 

■§ 

; 

«-« 

CO 

ID 

H 

■ 

■ 

■ 

M 

n 

05 

CD 

CM 

05 

■ 

'E3 

m 

00 

05 

CO 

0 

HM 

« 

CM 

CM 

CM 

CO 

CM 

CM 

CM* 

W 

fl 

E 

■ 

H 

CO 

B 

B 

■fl 

05 

CO 

o 

t— 

CO 

rf> 

CO 

o 

m 

cn 

B 

■ 

M 

E 

■ 

o* 

*■* 

IE 

in 

in 

■ 

M 

H 

E 

■ 

IE 

■ 

IBi 

mm 

■a 

■b 

CM 

Eh  £ 

B 

B 

■ 

' 

■ 

B 

h  n 

< 

< 

CQ 

■  < 

1 

CO 

1 9 

■ 

PQ 

DQ 

E- 

b 

m 

B 

1 

■ 

■ 

a 

t 

H 

w 

Eh 

Eh 

m 

■ 

Q 

H 

ft 

0 

Q 

Q 

fcr 

c 

c 

b 

& 

S5 

C 

£ 

w 

w 

m 

E 

Hi 

12 


Psychological  Research  Associates,  Inc. 


TABLE  8 

ANALYSIS  OF  VARIANCE  OF  IND/INT 
PROFICIENCY  TEST  SCORES 
SECTORS  X  OPERATORS  X  FORMS 


SOURCE 

SUM  OF  SQUARES 

d.  f. 

MEAN  SQUARE 

F 

Sectors 

1362 

2 

681 

Operators 

2174 

1 

2174 

3.  67 

Forms 

173 

1 

173 

S  x  O 

536 

2 

268 

O  x  F 

9 

1 

9 

S  x  F 

191 

2 

95 

S  x  O  x  F 

817 

2 

408 

Within 

102768 

169 

608 

104321 

176 

593 

variance  technique.  1  These  were  computed  for  the  total  test,  each 
position  and  each  form  separately,  by  sites.  Using  the  same  technique, 
the  reliabilities  were  also  computed  by  pooling  the  operator  and  tech¬ 
nician  scores  at  each  position,  and  then  pooling  these  scores  by  sectors 
in  order  to  increase  the  N.  Results  for  the  internal  consistency  relia¬ 
bility  computations  are  presented  in  Table  9;  results  for  the  second 
two  analyses  are  shown  in  Tables  10  and  11.  The  small  number  of 
cases  associated  with  each  coefficient  should  be  considered  in  any 
interpretation,  since  in  some  instances  negative  coefficients  and  a 
coefficient  greater  than  unity  occur ed.  The  coefficient  of  —1.07  is 
correctly  computed.  It  is  possible  in  either  of  these  methods  of 
reliability  estimation  to  obtain  much  anomalous  results.  This  can  be 


iHoyt,  C.  Test  reliability  obtained  by  analysis  of  variance. 
Psy chom etrika ,  1941,  6,  153-160. 
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TABLE  10 


RELIABILITY  COEFFICIENTS  COMPUTED  BY 
KUDER  RICHARDSON  FORMULA  20  FOR  POOLED  DATA 
ON  PROFICIENCY  TESTING  AT  THREE  SITES 


DATA  POOL 

FORM 

Rtt 

N 

BOADS  and  SYADS 

A 

.80 

15 

SD  and  SDT 

B 

.87 

20 

BOADS,  SYADS  and  R-G 

A 

.77 

34 

WD  and  WDT 

B 

.  74 

37 

BOADS,  SYADS  and  R-G 

A 

.74 

92 

IND  and  INT 

B 

TABLE  11 

.76 

89 

ALTERNATE  FORM  RELIABILITY  COEFFICIENTS 
FOR  PROFICIENCY  TESTING  AT  THREE  POSITIONS  IN  BOADS 


OPERATOR  TEST 

N 

rAB 

SD 

5 

.  68 

SDT 

5 

.  58 

SD  and  SDT 

10 

.  60 

WD 

5 

.  90 

WDT 

4 

.  91 

WD  and  WDT 

9 

.  81 

IND 

14 

.  60 

INT 

20 

.40 

IND  and  INT 

34 

.  64 
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exemplified  in  the  Hoyt  approach  where  it  is  only  necessary  that  the 
error  variance  exceed  the  variance  among  individuals.  It  should  be 
remembered  also  that  this  coefficient  is  based  on  an  N  of  3;  the  rational 
explanation  is  that  it  is  as  though  these  3  individuals  took  a  test  twice 
and  reversed  the  rank  order  of  their  scores  which  ranged  from  47  to  52. 
Second,  alternate  form  reliabilities  were  computed  for  the  BOADS  data, 
since  in  that  sector  each  incumbent  took  both  the  A  and  B  forms  of  the 
test.  For  27  cases  where  test  order  was  AB,  the  alternate  form  relia¬ 
bility  coefficient  was  0.  60;  for  26  cases  in  which  the  order  was  BA,  the 
coefficient  was  0.44.  Third,  for  those  items  which  were  common  to 
both  forms,  a  reliability  coefficient  was  computed  for  the  test-retest 
results  using  pooled  IND,  INT  scores  from  BOADS.  This  coefficient 
was  0.  59.  This  is  interesting  since  it  is  not  appreciably  better  than 
those  obtained  within  alternate  forms  when  only  some  of  the  items  were 
common  to  both  forms.  Finally,  using  the  INT  data  from  BOADS  only, 
the  reliabilities  of  the  S,  C  and  T  parts  of  the  test  were  estimated  using 
the  analysis  of  variance  technique  of  Hoyt.  This  procedure  is  identical 
in  result  to  K-R  Formula  20.  The  obtained  reliabilities  were  0.  18(S), 
0.43  (C),  and  0.46  (T).  There  is  some  question,  however,  about  the 
satisfaction  of  the  assumptions  required  for  making  this  type  of  relia¬ 
bility  estimate  with  either  the  KR-20  or  Hoyt  technique.  One  such 
assumption  is  that  of  complete  homogeneity  of  item  content  within 
parts;  stated  in  another  way,  that  there  is  no  variance  common  to  groups 
of  items  within  a  part.  Because  of  this  reservation  the  figures  given 
should  be  interpreted  as  lower  bounds.  Unfortunately,  no  upper  bound 
can  be  estimated. 

Fart  Intercorrelation 

Intercorrelations  were  computed  among  the  S,  C,  and  T  parts  of 
the  test.  Only  the  IND  and  INT  data  were  used  because  too  few  cases 
were  available  for  the  other  positions.  For  BOADS  the  coefficients 
represent  pooled  results  for  both  forms  of  the  test.  The  results  are 
reported  in  Table  12.  The  NYADS  data  are  from  the  pilot  administration 
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TABLE  12 

INTERCORRELATIONS  OF  PARTS  OF  SAGE 
PROFICIENCY  TESTS  ADMINISTERED 
TO  IND's  AND  INT’s  AT  DIFFERENT  SITES 


SITES 

IND 

INT 

IND /INT 

N 

=  44 

N 

=  28 

N 

=  72 

BOADS 

C 

T 

C 

T 

C 

T 

S 

43 

26 

s 

03 

05 

s 

35 

26 

C 

29 

c 

-21 

c 

22 

N 

=  30 

N 

=  30 

N 

=  60 

RICHARDS- 

C 

T 

C 

T 

C 

T 

GEBAUR 

S 

08 

18 

s 

-07 

27 

s 

08 

26 

c 

39 

c 

17 

c 

39 

N 

=  23 

N 

CD 

CM 

11 

N 

=  49 

SYADS 

C 

T 

C 

T 

C 

T 

s 

27 

41 

s 

60 

31 

s 

44 

31 

c 

63 

c 

41 

c 

54 

NYADS* 

N 

=  15 

N 

=  19 

C 

T 

C 

T 

s 

39 

48 

s 

33 

20 

c 

34 

c 

51 

ALL** 

N 

=  97 

N 

=  84 

N  = 

=  101 

C 

T 

C 

T 

C 

T 

s 

29 

33 

s 

31 

33 

s 

31 

32 

c 

48 

c 

29 

c 

45 

Data  from  preliminary  longer  form  of  test 

Does  not  include  NYADS  data  because  the  method  of  reliability 
estimation  does  not  permit  pooling  results  from  tests  with  differing 
numbers  of  items. 
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with  a  longer  test,  before  item  analysis.  If  the  coefficients  for  the 
combined  data  based  on  181  cases  are  taken  as  the  best  estimates  of  the 
parameter  values,  then  only  one  of  the  39  other  coefficients  is  different 
to  a  degree  significant  at  the  5%  level  of  confidence.  Thus  any  inter¬ 
pretations  of  differences  existing  between  operators  or  among  sectors 
is  unwarranted. 

Validity  Analyses 

At  BOADS  and  SYADS  it  was  possible  to  obtain  a  supervisor^  rank¬ 
ing  for  both  IND*s  and  INT*s  participating.  At  SYADS  the  ranking  was 
accomplished  within  crews;  the  resulting  forced  ties  presumably  atten¬ 
uated  correlations  between  these  and  other  data.  The  validity  coefficients 
for  parts  and  test  total  are  reported  in  Table  13.  Once  again,  the  very 
small  number  of  cases  precludes  adequate  interpretation.  Even  with 
the  negative  coefficients  obtained  in  some  cases,  differences  are  not 
significant  at  the  5%  level  of  confidence  for  any  pair  of  part  validity 
coefficients,  or  any  pair  of  total  validity  coefficients. 

Additional  Items 

For  the  test  as  described,  all  items  dealing  with  aircraft  pertain  to 
the  F-106A.  To  meet  the  needs  for  other  aircraft,  depending  on  the 
sector  in  which  the  test  is  administered,  10  items  each  of  parallel  form 
have  been  prepared  for  the  F-86L,  F-89J,  F-101B,  F-102A,  and  F-104A. 
Since  these  items  have  not  been  administered,  their  psychometric 
characteristics  are  unknown.  However,  it  is  believed  that,  aside  from 
fluctuation  in  p-value  associated  with  the  relative  accessibility  of  this 
kind  of  information  in  various  sectors,  these  items  are  essentially  the 
same  as  those  used  in  Forms  A  and  B  of  the  test. 

Item  p-values 

The  percentage  of  examinees  succeeding  on  each  item  was  computed 
for  all  items  in  the  IND/INT  test,  for  IND's  and  INT's  separately  at 
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each  of  the  sites.  These  data  are  given  in  the  Appendix.  Extreme 
caution  should  be  used  in  interpreting  these  p- values,  since  the  N's 
range  from  11  to  22.  It  will  be  noted  that,  despite  the  original  item 
analysis,  14  of  the  items  in  Form  A  and  11  of  the  items  in  Form  B 
have  p-values  greater  than  0.  85. 

TABLE  13 

VALIDITY  COEFFICIENTS  USING  A  RANKING 
CRITERION  FOR  A  PROFICIENCY  TEST  ADMINISTERED 
AT  BOADS  AND  SYADS  TO  UTD's  AND  INT's 


SITE 

POSITION 

N 

S 

Test  Part 

C  T 

TOTAL 

BOADS 

IND 

14 

.21 

-.22 

.31 

.28 

SYADS 

IND 

25 

.45 

.22 

-.03 

.34 

BOADS 

INT 

13 

.  12 

.40 

.  13 

.  26 

SYADS 

INT 

23 

.37 

.33 

.32 

.44 
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DISCUSSION 

From  the  information  presented,  it  is  possible  to  conclude  that  the 
tests  provide  discrimination  among  individuals,  that  they  are  reliable, 
that  they  have  satisfactory  validity  against  an  external  criterion,  and 
that  the  alternate  forms  are  reasonably  equivalent.  It  is  unfortunate, 
however,  that  it  is  not  possible  to  be  definitive  in  the  interpretation  of 
differences  occurring  among  the  data.  The  differences,  between 
operator  and  technician  in  the  same  position,  between  different  sectors 
for  the  same  position,  and  between  the  alternate  forms,  were  not 
statistically  significant,  using  the  IND/INT  data.  If  the  failure  to  reach 
significance  is  a  function  of  the  size  of  the  samples,  this  will  never  be 
known  since,  in  each  instance,  the  samples  include  all  of  the  available 
incumbents. 

In  the  opinion  of  the  writer,  real  differences  do  exist  among  the 
data,  and  it  is  believed  that  these  would  become  evident  if  there  were 
larger  samples.  For  example,  in  every  one  of  the  6  comparisons 
possible  between  an  IND  and  an  INT  on  total  score,  the  operator  had  the 
higher  score.  Since,  if  operator  and  technician  test  performances  were 
really  equal,  the  cited  event  could  be  expected  to  occur  by  chance  on  1 
in  64  times,  it  is  reasonable  to  conclude  that  the  operators  have  more 
test  knowledge  by  some  slight  amount.  It  is  true  also  that  for  the  SD 
vs  SDT  and  WD  vs  WDT  comparisons,  the  operator  has  a  higher  score 
in  each  case.  This  event  could  be  expected  by  chance  only  1  in  2,  048 
times,  so  again  it  is  reasonable  to  conclude  that  operators  have  the 
greater  amount  of  te  st  knowledge. 

Similarly,  in  comparing  sectors,  in  every  one  of  the  36  possible 
comparisons  on  part  scores,  and  the  12  possible  comparisons  on  total 
scores,  the  BOADS  incumbents  scored  higher  than  those  in  SYADS.  In 
comparing  BOADS  and  Richards -Gebaur  incumbents  the  BOADS  personnel 
have  higher  scores  on  part-scores  in  23  of  24  instances,  and  higher 
total  scores  in  every  one  of  8  instances.  In  contrast,  for  Richards - 
Gebaur  vs  SYADS,  the  Richards -Gebaur  personnel  are  higher  in  14  of 
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24  part-score  comparisons,  and  in  6  of  8  total  score  comparisons. 

These  results  are  consistent  with  chance  fluctuation  from  a  true  equal 
score  value,  although  the  RG  sample  consists  of  students  and  the  SYADS 
sample  is  operating  personnel.  It  seems  reasonable  to  conclude, 
therefore,  that  SYADS  and  Richards -Gebaur  personnel  have  less  of 
the  knowledge  measured  by  the  test  than  do  the  personnel  at  BOADS. 

It  is  suggested  on  the  basis  of  the  foregoing  discussion  that  the  data 
on  test  means  from  the  BOADS  testing  together  with  the  variability  data 
based  on  the  pooled  results  from  all  three  sites  be  used  for  normative 
purposes.  A  lower  bound  of  acceptability  might  be  defined  as  2  standard 
deviations  below  the  mean  for  any  particular  position.  Such  a  score 
would  compare  approximately  with  a  stanine  score  of  1.  For  example, 
IND's  might  be  required  to  earn  a  total  score  of  48.  Using  the  average 
standard  deviation  for  both  forms  of  IND  test  (5.  75),  the  computation 
would  be  59.  59  (the  mean  for  both  forms  of  the  IND  test)  minus  11.  50 
(2  x  5.  75)  =  48.09.  Application  of  the  cut-off  techniques  to  the  S,  C, 
and  T  part-scores  of  the  test  is  not  recommended  at  this  time.  Although 
the  part  reliabilities  of  0,  18,  0.  43  and  0.  46  are  lower  bounds,  there  is 
no  evidence  in  the  absence  of  further  data  collection  that  part-scores 
are  sufficiently  reliable  for  individual  test  interpretation. 


21 


Psychological  Research  Associates,  Inc. 


APPENDIX 
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